Create versioned releases of pandoc-wasm (released as wasm-pandoc on npmjs.com) #10

johanneswilm · 2025-02-11T09:22:43Z

This is the kind of thing I was thinking of in #9 . The changes are basically:

Include a patch to the pandoc sources and link to the original pandoc repository. No need to maintain a fork of the pandoc sources. Only the patch may need to be adjusted every now and then.
Publish versioned release files (for example by using {pandoc-version}.{pandoc-wasm-version}. Currently that would be 3.6.3.0.

Ideally, these releases would also be pushed to something like npm or some other repository for wasm files.

I understand a lot is going on in the wasm space and you expect changes on how this is done in the not too distant future. But meanwhile, these releases already serve a purpose so it would make sense to distribute them.

alerque · 2025-02-11T09:52:47Z

.github/workflows/build.yml

-          repository: haskell-wasm/pandoc
-          ref: wasm
+          repository: jgm/pandoc
+          ref: main


If you are going to put a version number on these builds, shouldn't they also be pinned to a specific commit (probably a release tag) for the underlying Pandoc version too?

@alerque Yes, you are right. I am trying to figure out how to best do this. It should probably be easy to get it to build with the most recent pandoc version and we probably need another digit to the number to show the version of the pandoc-wasm package. So 3.6.3.x instead of just 3.6.3. If you have a proposal of how to do the versioning in the most standard complying and simple way - I'm all for it.

Just adding a segment will make it hard to parse because Pandoc follows PVP and has a variable number of segments. I think you're going to need a different segment operator that plays nice with distro versioning (I think + is the most robust option). The question is probably does it make more sense to version this project first, then append the relevant Pandoc version, or the other way around? e.g. pandoc-wasm-3.6.3+0.1 or pandoc-wasm-0.1+3.6.3. I think the latter probably makes more sense but it depends on the expected release channels and use workflows I guess. I don't really have a handle on that.

@alerque OK, as someone working mainly with JS/TS in browsers, I would expect it to become available in npm. But I'm open to others needing it other places - maybe? I like both of your versioning proposals, and unless there is reason not to do so, I'd go for the second one then.

@alerque This seems to now be working with tagged versions, etc. . Only thing remaining is put it it on npm. As I assume that @TerrorJack or @tweaf or @alerque will want to have control over the npm repository after merging this PR (or writing something similar), I will not add that part.

@alerque It looks like npm does not allow this versioning scheme. It only allows 3 digit semver. So I split the two numbering apart. It's still really easy to update the pandoc version though.

johanneswilm · 2025-02-11T13:30:56Z

@georgestagg It looks like your archived pandoc-wasm package is registered with npm. Could we maybe put this in its place?

georgestagg · 2025-02-11T13:41:45Z

Yes, you have my permission. Which npm username should I invite as a maintainer for the package?

johanneswilm · 2025-02-11T14:36:18Z

@georgestagg I'll only do it if none of those who really developed this package will do it. My username is johanneswilm. But it would be better if @alerque @TerrorJack or someone else here would do it.I know very little about haskell and wasm.

amesgen · 2025-02-11T19:18:26Z

patch/pandoc.patch

I see that you used haskell-wasm/pandoc#1 to create this, makes sense 👍

Now that I see the minimized diff again, I realize that the only remaining thing that is actually patched in the Haskell code here is the addition of the wasm_main export, such that the same .wasm file can be used both as a command module (e.g. locally via wasmtime) and as a reactor (for the web app). While this is quite neat, having separate .wasm files for these would have the following advantages:

Apart from the removal of -threaded (which could be upstreamed behind a arch(wasm32) conditional), no pandoc patching would be necessary, one could just build stock pandoc-cli (of course with an appropriate cabal.project for various dependencies).

In the separate build that exposes an FFI, one could actually use the full Wasm JSFFI which is more convenient.

@johanneswilm In your use case, do you use the .wasm file as a command or a reactor module?

@amesgen That sounds promising. Yes, I took your patch in order to make the diff smaller.

My use case is this: I have created an open source word processor similar to Google Docs/Microsoft Word 365 Online, etc. for a specific niche market. I have written a number of export filters myself to common formats (like DOCX, ODT, HTML, EPUB, JATS, etc.). These are written in JS and all run in the end users browser. I have even written an import filter for ODT files that works the same way.

Now I have a number of users wanting to import and export from other, more exotic formats. So I've created import/export filters in JS to the pandoc internal json format. And I then let the client send the pandoc json to a server where pandoc is run in server mode, converting the json to one of the other formats and sending it back to the client.

This is a bit problematic as these conversions take up processing power on the server and it's quite complex to deploy pandoc to a number of different architectures for which there are no pre-compiled binaries. There could even be security issues about sending various files back and forth.

So my idea is to instead use this, if the user clicks on export to or import from an "exotic" format for the first time, the browser will download the pandoc.wasm file (to cache for future use) and then to do the conversion in the users own browser instead - taking the users own processing power and not that of the server.

I assume you are referring to the terms "reactor" and "command" as they are defined here [1]. Given that I only want to execute the conversion based on one input and then to close down again (only caching the binary so it doesn't have to be downloaded again), I assume this corresponds to pandoc-cli more than pandoc-server.

I haven't yet tried whether it is actually possible. Based on the web demo it seems like it should though.

I don't know the who-is-who of the haskell-wasm world either. So I don't know which one of all of you can make any decisions here and who would be a good candidate to maintain an npm package of pandoc-wasm. I do maintain several open source packages, but those are written in languages that I use daily (like JS/TS or Python). So if one of you would want to step forward and do this, I'd be in favor.

[1] WebAssembly/WASI#13 (comment)

This reverts commit b4f0480.

johanneswilm · 2025-03-25T17:51:35Z

@TerrorJack I had to pin ghc-wasm-meta to a specific git commit as your recent changes there seem to have broken the pandoc wasm build. By pinning ghc-wasm-meta, I was able to build it with pandoc 3.6.4 (latest).

NiklasEi · 2025-05-05T14:07:55Z

src/index.js

+        wasi_snapshot_preview1: wasi.wasiImport
+    })
+
+    wasi.initialize(instance)


Is it a good idea to initialize a new instance for every call to pandoc? I am not 100% sure it is related, but I keep running into out of memory errors with this approach.

My use case is an HTML WYSIWYG editor that converts its HTML to Typst on every change, so the pandoc method is called a lot.

The previous script reused one instance for all conversions and I did not run into any out of memory errors.

Interesting. So that means that previous instances are not garbage collected? Do you happen to keep references to the old instances anywhere?

Why the change - it has been a while since I wrote this part. Let me try to reconstruct why it was done this way.

@NiklasEi I think it may have to do with the fact that the conversion process can produce several more output files - for example image files with differing names. So if you reuse the instance, the media folder will have both the images from previous and the current run. I don't know this wasi shim enough to say whether it's possible to "clean" the folders after each run - that may be possible, but I didn't find much documentation at the time.

How exactly does the out-of-memory issue manifest? Do you get a popup from the browser saying it is out of memory? If you keep the instance around, you'd have to make sure everything is synchronous (no worker and conversion 2 has to wait until conversion 1 is fully done)+ the cleanup. And you'd at least have to keep one wasi instance around at all times in memory - which I understand is ok in your case.

I use it for something fairly similar, I only convert upon user request. And if the garbage collection is working OK, then the instance should disappear again from memory after the conversion. This should be even more important if you do lots of conversions.

I am not reading or writing any files other than input/output.

Here a screenshot of my console. I cannot share the stacktrace but there doesn't seem to be anything interesting in it anyways.

Hey @NiklasEi

I am not reading or writing any files other than input/output.

I understand, but some of the transformations will create these extra files in the internal file system.

As for my other question: are you keeping references to anything that is related to running this in your JavaScript code? I am asking because keeping a reference will do that the garbage collector will not be able to release the memory consumed by a previous instance, AFAIU.

I cannot share the stacktrace but there doesn't seem to be anything interesting in it anyways.

Ok, so you are working on a closed source competitor to my open source software and want my help in debugging it without being able to share many details? :)

Seriously though, I understand that you may have a reason to do things closed source. But that also makes it difficult to help for people outside of your company.

As I understand it, you are not even entirely sure why you have a problem and whether it is related to this at all. I have incorporated a ton of changes in this release, this being one of many. It doesn't really make sense for me to spend another day or three on reverting this change and then solving all the resulting problems in some other way - only for you to then to discover that your issue is something entirely different.

My recommendation would be for you to try to create a minimal example that you can publish openly that exhibits the problem and then I can also help you with resolving the issue you are facing if it turns out to be a bug in this library.

Please file future issues with https://github.com/fiduswriter/wasm-pandoc/tree/main

FWIW, I did some memory testing with Fidus Writer using the Chromium development console memory tab. Memory usage goes up by about 130MB during conversion with pandoc (236MB to 366MB). When the conversion is done, the entire 130MB are released and it goes back down to 236MB. I've tried exporting a number of times - the memory usage always goes back down to the same amount. But I made sure not to keep any reference to a previous instance of a previous run anywhere in my code so that the garbage collector can release the memory consumed by pandoc.

johanneswilm and others added 11 commits February 10, 2025 22:53

update pandoc repo

31d06a4

Update links

75fbbed

Update README.md

c92ad77

use patch for build

3259e24

Merge branch 'master' of github.com:johanneswilm/pandoc-wasm

41623ec

clarify readme wording

9d68a29

Upload releases to ga run when not tagged

9f59d23

actions/upload-artifact@v4

15afec9

fix filename

4d9de2e

change reference to VERSION

25ad35c

use env.VERSION

e1317d4

alerque suggested changes Feb 11, 2025

View reviewed changes

johanneswilm and others added 11 commits February 11, 2025 11:32

use tagged pandoc version

94f0e68

reorganize github actions

a92165e

switch github events release type

4d78291

github action events change

d99cc8b

0.2+3.6.3

5245d39

Add package.json

dd1429e

fix package.json syntax

59f7aeb

0.3+3.6.3

c59abb7

reorganize zipping

5632d85

Merge branch 'main' of github.com:johanneswilm/pandoc-wasm

d633022

Use semver (requirement by npm)

23ce56a

set up for distribution via npm

ac6efee

amesgen reviewed Feb 11, 2025

View reviewed changes

johanneswilm added 2 commits February 13, 2025 14:56

Add maintainership. make exports work, add example to readme.

6cf1ed2

lint

055a9dc

johanneswilm added 19 commits February 27, 2025 15:04

fix url

f3e28b7

pre-commit==4.1.0

5e0b198

switch to pre-commit action

d2a5624

style changes

6d3a56e

lint: add quotes

82ff51a

always deploy pages

9fe1336

0.6.3

0331c8c

output github ref name

eb0ce81

0.6.4

bd5849e

github pages not for tag

d3fe83a

different ref reference

de5a283

reset tagging, giving up

5567650

publish on release - does not seem to work with tag

b4f0480

0.6.5

a866526

Revert "publish on release - does not seem to work with tag"

29f2328

This reverts commit b4f0480.

test publishign with tags again

7514486

0.6.6

19aa032

pandoc 3.6.4

4a41420

0.7.0

f3aa330

johanneswilm requested a review from alerque March 24, 2025 22:19

johanneswilm added 3 commits March 24, 2025 23:47

update patch

59297a8

pin ghc-wasm-meta

2935bcc

0.7.1

65d5776

NiklasEi reviewed May 5, 2025

View reviewed changes

johanneswilm added 2 commits May 26, 2025 21:57

add repo links

537b73e

pandoc 3.7.0.1

0c1b15b

johanneswilm changed the title ~~Create versioned releases of pandoc-wasm~~ Create versioned releases of pandoc-wasm (released as wasm-pandoc on npmjs.com) May 26, 2025

johanneswilm added 2 commits May 26, 2025 22:29

depencdency update

ba14fa3

0.8.0

af13710

Create versioned releases of pandoc-wasm (released as wasm-pandoc on npmjs.com) #10

Are you sure you want to change the base?

Create versioned releases of pandoc-wasm (released as wasm-pandoc on npmjs.com) #10

Uh oh!

Conversation

johanneswilm commented Feb 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johanneswilm commented Feb 11, 2025

Uh oh!

georgestagg commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johanneswilm commented Feb 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johanneswilm commented Mar 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johanneswilm May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

georgestagg commented Feb 11, 2025 •

edited

Loading

johanneswilm May 5, 2025 •

edited

Loading